Causal models for qualitative and mixed methods inference

Mixed methods

Macartan Humphreys and Alan Jacobs

1 Mixed Methods

1.1 Population-level causal questions

  • Average causal effects for a population
    • e.g., What is the average effect of \(X\) on \(Y\)
  • Proportion of different effects in a population
    • What share of cases in the population have positive effects?
    • What share have negative effects?
  • Causal pathways
    • e.g., How commonly does \(X\) affect \(Y\) through \(M\) (vs. \(N\)) in the population?

1.2 Causal queries on a DAG: population-level average causal effect

  • What is the average effect of \(X\) on \(Y\) in the population?
    • This is a question about the values in \(\lambda^Y\)

1.3 Causal queries on a DAG: population-level average causal effect

  • With binary variables, the average effect is always the difference between
    • The share of cases with a positive effect
    • The share of cases with a negative effect
  • \(\lambda_{01} - \lambda_{10}\)

1.4 Causal queries on a DAG: population-level average causal effect

\(ATE = \lambda^Y_{01} - \lambda^Y_{10}\)

1.5 ATE with a mediator

We need to

  • Combine two ways of generating positive effects
  • Combine two ways of generating negative effects
  • Subtract the first from the second

1.6 ATE with a mediator

  • Two ways of generating positive effects: \(\lambda^M_{01} \times \lambda^Y_{01}\) + \(\lambda^M_{10} \times \lambda^Y_{10}\)
  • Two ways of generating negative effects: \(\lambda^M_{10} \times \lambda^Y_{01}\) + \(\lambda^M_{01} \times \lambda^Y_{10}\)
  • Subtract the first from the second

1.7 Causal queries on a DAG: how often does \(X\) have a positive effect on \(Y\)?

  • Proportion of cases with positive effect = \(\lambda^Y_{01}\)

1.8 Causal queries on a DAG: how often does \(X\) have a positive effect on \(Y\)?

  • Proportion of cases with positive effect = \(\lambda^M_{01} \times \lambda^Y_{01}\) + \(\lambda^M_{10} \times \lambda^Y_{10}\)

1.9 Causal queries on a DAG: pathway questions

  • We can also pose the pathway query at population level
    • What is the share of cases for which \(X\) has a positive effect on \(Y\) through \(M\)?
  • A question about joint \(\lambda^M\) and \(\lambda^Y\) distributions

1.10 How will we answer questions about populations?

  • We need to learn about those \(\lambda\)’s
    • About the proportions of the population with different kinds of causal effects
  • We will have a prior belief about those proportions
  • When we see data on lots of cases, we will update those beliefs about proportions
    • From a prior distribution over \(\lambda\) to a posterior distribution over \(\lambda\)

1.11 How do we “update” our models?

  • We’ve talked about process tracing a single case to answer a case-level query
    • Here the model is fixed
    • We use the model + case data to answer questions about the case
  • We can also use data to “update” our models
    • Use data on many cases to learn about causal effects in the population
  • Allows mixing methods: using data on lots of cases, we can learn about probative value of process-tracing evidence
  • The core logic: we learn by updating population-level causal beliefs toward beliefs more consistent with the data

1.12 Start with a DAG

1.13 Large-\(N\) estimation of \(ATE\): basic intuition

  • The most straightforward
  • Suppose we collect data on \(I\) and \(D\) for a large number of cases
  • We observe a strong positive correlation
  • We will think there’s a positive average effect
    • Note: this specific model rules out confounding
    • All exogenous nodes are independently assigned
      • We will complicate this later

1.14 Now, complicate the DAG

1.15 Large-\(N\) estimation of \(ATE\): what happens to beliefs over parameters

  • Again, we collect data on \(I\) and \(D\) for a large number of cases
  • We observe a strong positive correlation
  • We will think there’s a positive average effect

1.16 Large-\(N\) estimation of \(ATE\): what happens to beliefs over parameters

  • Now, we will update on both \(\lambda^M\) and \(\lambda^D\)
  • An \(I \rightarrow D\) effect can only happen if \(I\) affects \(M\) and \(M\) affects \(D\), in specific ways
  • Two possible combinations of effects can generate a positive \(I \rightarrow D\) effect
    • \(I \rightarrow M\) is positive, \(M \rightarrow D\) is positive
    • \(I \rightarrow M\) is negative, \(M \rightarrow D\) is negative
  • So we will come to put more weight on a joint distribution of \(\lambda^M\) and \(\lambda^D\) in which there are lots of cases with one of these two combinations
  • …and less posterior weight on all other combinations of effects

1.17 Learning with confounding

  • Say we just observe a positive Inequality-Democratization correlation
  • Could be because Inequality causes Democratization
  • Could be because of confounding

1.18 Learning with confounding

  • So observing \(M\) helps!
    • Allows us to learn if \(I\) affects \(M\)
    • And if \(M\) affects \(D\)
  • Suppose we see:
    • a strong positive \(I\) to \(M\) correlation
    • a strong positive \(M\) to \(D\) correlation
  • We update toward belief that \(I\) to \(D\) effect is common, and that confounding is uncommon
  • If \(I/M\) and \(M/D\) correlations are weak/absent, then confounding likely
  • Process data helps address the deep problem of confounding

1.19 In sum: learning from data

  • For any data pattern, we gain confience in parameter values more consistent with the data
  • For single-case inference, we must bring background beliefs about population-level causal effects
  • For multiple cases, we can learn about effects from the data
  • Large-\(N\) data can thus provide probative value for small-\(N\) process-tracing
  • All inference is conditional on the model

1.20 Start with a DAG

  • We’ll want to learn about the \(\theta\)’s and the \(\lambda\)’s
  • We need to observe nodes to learn about other nodes
  • We can potentially observe 3 nodes here: \(X, M\), and \(Y\)

1.21 A typical “quantitative” data structure

  • Data on exogenous variables and a key outcome for many cases

  • E.g., data on inequality (\(I\)) and democracy (\(D\)) for many cases

1.22 A typical “quantitative” data structure

1.23 A typical “quantitative” data structure

  • Or maybe data on inequality, democracy, and international pressure for many cases

1.24 A typical “qualitative” data structure

  • Data on exogenous variables and a key outcome plus elements of process for a small number of cases
    • Finite resources mean tradeoffs between extensive and intensive data collection
  • E.g., data on inequality (\(I\)), mass mobilization (\(M\)), and democracy (\(D\)) for many cases

1.25 A typical “qualitative” data structure

1.26 Mixing qualitative and quantitative

  • What if we combine extensive data on many cases with intensive data on a few cases?
  • A non-rectangular data structure

1.27 Non-rectangular data

  • A data structure that neither standard quantitative nor standard qualitative approaches can handle in a systematic way
  • Not a problem for the Integrated Inferences approach
  • We simply ask:
    • Which causal effects in the population are most and least consistent with the data pattern we observe?
  • That is, what distribution of causal effects in the population, for each node, are most consistent with this data pattern?
  • CausalQueries uses information wherever it finds it

1.28 How qual can inform quant: confounding

Remember: * Say we just observe a positive Inequality-Democratization correlation * Could be because Inequality causes Democratization * Could be because of confounding

1.29 How qual can inform quant: confounding

Remember * Observing \(M\) helps * Process data helps address the deep problem of confounding * Key point: we don’t need \(M\) for all cases * Can learn from \(I\) and \(D\) for lots of cases and \(M\) for a subset

1.30 How qual can inform quant: observable confounder

  • Another example: \(M\) as the confound

1.31 How qual can inform quant: observable confounder

  • How much can we learn from \(M\) data for some cases?

1.32 How quant can inform qual: getting probative value of a clue from the data

  • Suppose we go to the field and we learn that mass mobilization DID occur in Malawi

    • So \(M=1\)
  • What can we conclude?

  • NOTHING YET!

1.33 How quant can inform qual: getting probative value of a clue from the data

  • The pure process-tracing solution: assign our beliefs about causal effects in the population
    • E.g., beliefs that linked positive effects are more likely than linked negative effects
    • Meaning that \(M=1\) in an \(I=1, D=1\) case speaks in favor of \(I=1\) causing \(D=1\)
  • The mixed-methods solution: learn about population-level effects from large-\(N\) data

1.34 How quant can inform qual: getting probative value of a clue from the data

  • Suppose we have data on \(I\), D$, and \(M\) for a large number of cases
  • Suppose we observe a strong positive correlation across all 3 variables
  • What have we learned, under this model?
    • Positive \(I \rightarrow M\) effects more likely than negative
    • Positive \(M \rightarrow D\) effects more likely than negative
  • So linked positive effects more common than linked negative effects
  • Meaning that \(M=1\) in an \(I=1, D=1\) case speaks in favor of \(I=1\) causing \(D=1\)
  • But now we’ve now drawn our population-level beliefs from the data
  • Now, we can go and process-trace
    • Did high inequality cause democratization in Malawi?
    • Observe \(M\)
  • With conclusions grounded in case-level AND population-level evidence

1.35 Application

2 References

2.0.1 References